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Abstract 


We suggest a new algorithm for two-person zero-sum undiscounted 
stochastic games focusing on stationary strategies. Given a positive real 
e, let us call a stochastic game e-ergodic, if its values from any two initial 
positions differ by at most e. The proposed new algorithm outputs for 
every e > 0 in finite time either a pair of stationary strategies for the two 
players guaranteeing that the values from any initial positions are within 
an e-range, or identifies two initial positions u and v and corresponding 
stationary strategies for the players proving that the game values starting 
from u and v are at least e/24 apart. In particular, the above result 
shows that if a stochastic game is e-ergodic, then there are stationary 
strategies for the players proving 24e-ergodicity. This result strengthens 
and provides a constructive version of an existential result by Vrieze (1980) 
claiming that if a stochastic game is 0-ergodic, then there are e-optimal 
stationary strategies for every e > 0. The suggested algorithm is based 
on a potential transformation technique that changes the range of local 
values at all positions without changing the normal form of the game. 
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1 Introduction 


1.1 Basic Concepts and Notation 

Stochastic games were introduced in 1953 by Shapley |Sha53] for the discounted 
case, and extended to the undiscounted case by Gillette |Gil57j . Each such game 
r = ''’fcf I k G K'", i G L'", u,v G V) is played by two players on a finite 
set V of vertices (states, or positions); K"" and L’' for u G E are finite sets 
of actions (pure strategies) of the two players; G [0,1] is the transition 
probability from state v to state u if players chose actions k G K'^ and i € L'" at 
state V € V; and G M is the reward player 1 (the maximizer) receives from 
player 2 (the minimizer), correpsonding to this transition. We assume that the 
game is non-stopping, that is, J2uevPke — ^ all u G F and k G K"" , I G L" . 
To simplify later expressions, let us denote by G [0, the transition 

matrix, the elements of which are the probabilities and associate in T a 
local expected reward matrix to every u G E defined by 

( 1 ) 

uGV 


In the game T, players first agree on an initial vertex vo € V to start. Then, 
in a general step j = 0,1,..., when the game arrives to state Vj = v G V, 
they choose mixed strategies a" G A^K"") := {y G | Pi ~ Vi ^ 

0 for i G K'"} and jd"" G A{L'"), player 1 receives the amount of bj = a"A^f)'" 
from player 2, and the game moves to the next state u chosen according to the 
transition probabilities 

The undiscounted limiting average (effective) payoff is the Cesaro average 


5"“(r) 


lim inf 
N—¥oc 


1 

N+1 


N 


1=0 


( 2 ) 


where the expectation is taken over all random choices made (according to mixed 
strategies and transition probabilities) up to step j of the play. The purpose of 
player 1 is to maximize ^'"“(r), while player 2 would like to minimize it. 

In 1981, Mertens and Neymann in their seminal paper |MN81] proved that 
every stochastic game has a value from any initial position in terms of history 
dependent strategies. An example (the so-called Big Match) showing that the 
same does not hold when restricted to stationary strategies was given in 1957 
in Gillette’s paper |Gil57) : see also |BF68) . 

In this paper we shall restrict ourselves (and the players) to the so-called 
stationary strategies, that is, the mixed strategy chosen in a position v gV can 
depend only on v but not on the preceding positions or moves before reaching 
V (i.e., not on the history of the play). We will denote by /C(r) and £(r) the 
sets of stationary strategies of player 1 and player 2, respectively, that is, 


/C(r) = (g) A{K^) and £(r) = (g) A(L"). 

vGV v^V 


2 













Vrieze (1980) showed that if a stochastic game F has a value = m, which 

is a constant, independent of the initial state vq G V, then it has a value in 
e-optimal stationary strategies for any e > 0. We call such games ergodic and 
extend their definition as follows. 

Definition 1 For e > 0, a stochastic game F is said to be e-ergodic if the game 
values from any two initial positions differ by at moste, that is, | 5 ^(F)—g“(F)| < 
e, for all u,v GV. A 0-ergodic game will be simply called ergodie. 

Our main result in this paper is an algorithm that decides, for any given 
stochastic game F and e > 0, whether or not F is e-ergodic, and provides a 
witness for its e-ergodicity/non-ergodicity. As a corollary, we get a constructive 
proof of the above mentioned theorem of Vrieze [VriSO] . A notion central to 
our algorithm is the concept of a potential transformation introduced in the 
following section. 

1.2 Potential transformations 

In 1958 Gallai |Gal58) suggested the following simple transformation. Let x : 

V —>■ K be a mapping that assigns to each state v € V a real number cc" called 
the potential of v. For every transition {v,u) and pair of actions k € and 
£ G L'" let us transform the payoff as follows: 

rin^)=rl7+x--x\ 

Then the one step expected payoff amount changes to E[&j (x)] = E[&j] -|-E[a::’'^ ] — 
E[x’'j+i], where vj G V is the (random) position reached at step j of the play. 
However, as the sum of these expectations telescopes, the limiting average payoff 
remains the same for all finite potentials: 

g’^“(F(x)) = ^’'“(F) lim —E[a;’'° - = 5 "«(F). 

N—^oo N 

Thus, the transformed game remains equivalent with the original one. 

Using potential transformations we may be able to obtain a proof for ergodicity/non- 
ergodicity. This is made more precise in the following section. 
m'" is the value of the matrix game A'" at state v. 

1.3 Local and Global Values and Concepts of Ergodicity 

Let us consider an arbitrary potential x G and define the local value nF^x) 
at position v GV as the value of the \K^\ x |L''| local reward matrix game A" (a;) 
with entries 

aUx) = Y, plfirlf +x^- x^, for all kGK\£G L^ (3) 

uGV 

that is, 

m" (x) = Val (A'’(xy) := max min o’ A'’ = min max o’ A" (x^P". 

a'^eA(K«) /SoeACZ,") l3'^eA(L«) a«eA(K'^) 
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To a pair of stationary strategies a = G V) G /C(r) and (3 = {/3'"\v G 

V) G /i(r) we associate a Markov chain on states in V, defined by the 

transition probabilities . Then, this Markov chain has unique 

limiting probability distributions u G V), where is the probability of 

staying in state u G V when the initial vertex is u G y. With this notation, The 
limiting average payoff @ starting from vertex v G V can be computed as 

g-{a,P) = ( 4 ) 

uev 

The game is said to be to be solvable in uniformly optimal stationary strategies, 
if there exist stationary strategies a G /C(r) and /3 G >C(r), such that for all 
initial states v G V 

g'"(a,P) = max g'’la,P) = min g'"{a,/3). (5) 

This common quantity, if exists, is the value of the game with initial position 
V GV, and will be simply denoted by g'^ = g'"{T). 

1.4 Main Result 

Given an undiscounted zero-sum stochastic game, we try to reduce the range of 
its local values by a potential transformation x G If they are equalized by 
some potential a;, that is, m'’{x) = m is a constant for all v G V, we say that the 
game is brought to its ergodic canonical form [BEGMl3a| . In this case, one can 
show that the values g'" exist and are equal to m for all initial positions v G V, 
and furthermore, locally optimal strategies are globally optimal |BEGM1^ . 
Thus, the game is solved in uniformly optimal strategies. However, typically we 
are not that lucky. 

To state our main theorem, we need more notation. 

• kb > 0 is smallest integer s.t. either p'jff = 0 or p'jff > 1/W 

• R is the smallest real s.t. 

0 < rl^ < R (6) 

• = max„gy{max{|iG’'|, |L’'|}}. 

• n =\V\ 

• r] = max{log 2 R, log 2 W} (maximum ’’bit length”) 

Theorem 1 For every stochastic game and e > 0 we can find in 
^ nNWR ^oC^ either a potential vector x G proving that 

the game is iflAe)-ergodic, or stationary strategies for the players 
proving that it is not e-ergodic. 
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The proof of Theorem[T]will be given in SectionU] One major hurdle that we 
face is that the range of potentials can grow doubly exponentially as iterations 
proceed, leading to much worse bounds than those stated in the theorem. To 
deal with this issue, we use quantifier elimination techniques |BPR96[ IGV881 
IRen92) to reduce the range of potentials after each iteration; see the discussion 
preceding Lemma [SI 


2 Related Work 

The above dehnition of ergodicity follows Moulin’s concept of the ergodic exten¬ 
sion of a matrix game |Mou76] (which is a very special example of a stochastic 
game with perfect information). Let us note that slightly different terminology 
is used in the Markov chain theory; see, for example, [KS63) . 

The following four algorithms for undiscounted stochastic games are based 
on stronger ’’ergodicity type” conditions: the strategy iteration algorithm by 
Hoffman and Karp [HK66] requires that for any pair of stationary strategies 
of the two players the obtained Markov chain has to be irreducible; two value 
iteration algorithms by Federgruen are based on similar but slightly weaker re¬ 
quirements; see [Fed80) for the definitions and more details; the recent algorithm 
of Chatterjee and Ibsen-Jensen |CI,T 14) assumes a weaker requirement than the 
strong ergodicity required by Hoffman and Karp |HK66) : they call a stochastic 
game almost surely ergodic if for any pair of (not necessarily stationary) strate¬ 
gies of the two players, and any starting position, some strongly ergodic class 
(in the sense of |HK66p is reached with probability 1. 

While these restrictions apply to the structure of the game, our ergodicity 
definition only restricts the value. Moreover, the results in |HK66) and |CIJ 14) 
apply to a game that already satisfies the ergodicity assumption, which seems 
to be hard to check. Our algorithm, on the other hand, always produces an 
answer, regardless whether the game is ergodic or not. 

Interestingly, potentials appear in [Fed80) implicitly, as the differences of 
local values of positions, as well as in |HK66) . as the dual variables to linear 
programs corresponding to the controlled Markov processes, which appear when 
a player optimizes his strategy against a given strategy of the opponent. Yet, 
the potential transformation is not considered explicitly in these papers. 

We prove Theorem [T] by an algorithm that extends the approach recently 
obtained for ergodic stochastic games with perfect information [BEGMl^ and 
extended to the general (not necessarily ergodic) case in [BEGM13b] . This 
approach is also somewhat similar to the first of two value iteration algorithms 
suggested by Federgruen in |Fed80) . though our approach has some distinct 
characteristics: It is assumed in |Fed80] that the values g” exist and are equal 
for all w; in particular, this assumption implies the e-ergodicity for every e > 0. 
For our approach we do not need such an assumption. We can verify e-ergodicity 
for an arbitrary given e > 0, or provide a proof for non-ergodicity (with a small 
gap) in a finite time. Moreover, while the approach of [FedSO] was only shown to 
converge, we provide a bound in terms of the input parameters for the number 
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of steps. 

Several other algorithms for solving undiscounted zero-sum stochastic games 
in stationary strategies are surveyed by Raghavan and Filar; see Sections 4 (B) 
and 5 in |RF91) . The only algorithmic results that we are aware of that provide 
bounds on the running time for approximating the value of general (undis¬ 
counted) stochastic games are those given in |CMH08l iHKL+li] : in [CMH08) , 
the authors provide an algorithm that approximates, within any factor of e > 0, 
the value of any stochastic game (in history dependent strategies) in time 
(riN)"^ poly(? 7 , log i). In the authors give algorithms for discounted 

and recursive stochastic games that run in time 2^°'^ ' poly(77, log(i)), and 
claim also that similar bounds can be obtained for general stochastic games, by 

O ( 71 ^ ) 

reducing them to the discounted version using a discount factor of J 
(and this bound on S is almost tight [Mill 1) 1. These results are based on quanti¬ 
fier elimination techniques and yield very complicated history-dependent strate¬ 
gies. For almost sure ergodic games, a variant of the algorithm of Hoffman and 
Karp [HK66] was given in [CIJ14] : this algorithm finds e-optimal stationary 

( 2 n\ 

poly(fV, ry). This result is not compa¬ 
rable to ours, since the class of games they deal with are somewhat different 
(although both generalize the class of strongly ergodic games of |HK66) 1. Fur¬ 
thermore, the algorithm in Theorem [T] exhibits the additional feature that it 
either provides a solution in stationary strategies in the ergodic case, if one ex¬ 
ists, or produces a pair of stationary strategies that witness the non-ergodicity. 


3 Pumping Algorithm 

We begin by describing our procedure on an abstract level. Then we specialize 
it to stochastic games in Sectional 

Given a subset S CV, let us denote by es & {0,1}^ the characteristic vector 
of 5'. 

Let us further assume that m^{x) for u S R are functions depending on 
potentials x G M" (where n = |R|) and satisfying the following properties for all 
subsets S CV and reals (5 > 0: 

(i) m'"{x — Ses) is a monotone decreasing function of <5 if u G S'; 

(ii) m'"{x — Ses) is a monotone increasing function of i5 if u ^ S; 

(hi) \m'"{x) — rn"{x — (5es)| < 5 for all v CV. 

We show in this section that under the above conditions we can change 
iteratively the potentials to some x' C M" such that either all values mC{x'), 
V C V, are very close to one another or we can find a decomposition of the 
states V into disjoint subsets proving that such convergence of the values is not 
possible. 
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Our main procedure is described in Algorithm [5] below. Given the current 
vector of potentials Xt at iteration r, the procedure partitions the set of vertices 
into four sets according to the local value m'"{x). If either the first (top) set 
Tt or forth (bottom) set Bt is empty, the procedure terminates; otherwise, the 
potentials of all the vertices in the first and second sets are reduced by the same 
amount <5, and the computation proceeds to the next iteration. 


Algorithm 1 PuMP(a::, S) 

Input: a stochastic game F a subset S of states. 

Output: a potential x G R"®. 

1: Initialize r := 0, and Xr '■= x. 

2 : Set := max^gs m^(a:r), := min„gs m’^(a:r), and (5 := (to+— m“)/4. 

3: Define 

Tr := {w G S' I m^lxr) > m~ + 3^} 

Bt := {u G S I m'"(xT) < m~ + 5} 

Mt := S\(TtUBt). 

4 : M Tt = % ov Bt = % then 
5: return Xt 

6 : end if 

7: Otherwise, set Pt '■= {v & S \ my{xT) > m~ + 25} and update 

y _f Xt — S if V € Pt 

■ } Xt otherwise. 

8: Set r := r + 1 and Goto step |3l 


We can show next that properties (i), (ii) and (iii) above guarantee some 
simple properties for the above procedure. 

Lemma 1 We have TV+i O Tt, Bt+i C Bt and Mt+i D Mt for all iterations 
r = 0,l,... 

Proof Indeed, by (i) and (iii) we can conclude that m'"{xT) > rn~ + 6 holds 
for all V G Pt- Analogously, by (ii) and (iii) ni^{xT) < m~ + 35 follows for all 

V ^ Pt- □ 


Lemma 2 Either Tt — % or Bt — 0 for some finite t, or there are nonempty 
disjoint subsets I,F ff S, I Tt, F A Bt, and a threshold tq, such that for 
every real A > 0 there exists a finite index t(A) > tq such that 

(a) m'"{xT) > m~ + 25 for all v G I and m'"{xT) < m~ + 25 for all v G F, 
and for all t > tq; 
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(b) a;“ — > A for all v £ I and u ^ I, and for all r > t(A); 

(c) — a;“ > A for all v £ F and u ^ F, and for all r > t(A). 

Proof By Lemma [T] sets and Br can change only monotonically, and hence 
only at most IS*] times. Thus, if PuMP(a:, S) does not stop in a finite number 
of iterations, then after a finite number of iterations the sets Tr and will 
never change and all positions in Tr remain always pumped (that is, have their 
potentials reduced), while all positions in Br will be never pumped again. 

Assuming now that the pumping algorithm PuMP(a:, S) does not terminate, 
let us define the subset / C S' as the set of all those positions which are always 
pumped with the exception of a finite number of iterations. Analogously, let F 
be the subset of all those positions that are never pumped with the exception 
of a finite number of iterations. Since I and F are finite sets, there must exist 
a finite tq such that for all r > tq we have I £ Pr and F £ Pr =0, implying 

(a) . Note that any vertex in Tr is always pumped by (hi) and hence Tr Q I for 
any r > tq; similarly, Br C F for any t > tq. 

Let us next observe that all positions not in / U P' are both pumped and not 
pumped infinitely many times. Thus, since d is a fixed constant, for every A 
there must exist an iteration t(A) > tq such that all positions not in I are not 
pumped by at least A/d many more times than those in /, and all positions not 
in F are pumped by at least A/S many more times than those in P, implying 

(b) and (c). □ 


Let us next describe the use of PuMp(a;, S) for repeatedly shrinking the 
range of the in'" values, or to produce some evidence that this is not possible. 
A simplest version is the following: 


Algorithm 2 REPEATEDPuMPiNG(e) 

1: 

Initialize h := 0, and Xh 

= 0 e K^. 

2: 

Set m'^lh) := max^gyrn 

"(xh) and m~(h) := 

3: 

If mF{h) — TO {h) < e then STOP. 

4: 

Xh+i :=PUMP(a:/i, P); h 

= h + 1. 

5: 

Goto step [2] 



Note that by our above analysis, RepeatedPumping either returns a po¬ 
tential transformation for which all to", v £ V values are within an e-band, 
or returns the sets / and F as in Lemma [5] with arbitrary large potential dif¬ 
ferences from the other positions. In the next section we use a modification 
of these procedures for stochastic games, and show that those large potential 
differences can be used to prove that the game is not e-ergodic. 

4 Application of Pumping for Stochastic Games 

We show in this section how to use RepeatedPumping to find potential trans¬ 
formations verifying e-ergodicity, or proving that the game is not e-ergodic, thus 
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establishing a proof of Theorem [T] Towards this end, we shah give some nec¬ 
essary and sufficient conditions for e-non-ergodicity, and consider a modihed 
version of the pumping algorithm described in the previous section which will 
provide a constructive proof for the above theorem. 

Let us hrst observe that the local value function of stochastic games satishes 
the properties required to run the pumping algorithm described in the previous 
section. 

Lemma 3 For every subset S QV and (5 > 0 and for all v G V we have 

m'^{x) > m'"{x-5es) > m"{x) - 5ui&yik,tYl,u^sPM */ ^ & S, 
m'"{x) < m'"{x-5es) < -f <5 maxfe,^ */ ^ S. 

Furthermore, the value functions m"" {x) forv G V satisfy properties (i), (ii) and 
(Hi) stated in Section\^ 

Proof According to Q we must have for all d > 0 that A"(x) > A”(a; — Ses) 
for all u G S' and A^{x) < A'"(x — 6es) for aA v ^ S proving properties (i) 
and (ii) (Indeed, A'’{x — Ses) = A'"{x) — S{E'’ — for u G S and 

A^l^x — Ses) = A”(a;) -|- SJ2ueS^'^'^ fo'^ v ^ S, where E'^ is the lAT”! x |L’'|- 
matrix of all ones. Since the operator Val {B) is monotone increasing in B, 
inequalities ([7]) follow). Property (hi) follows directly from 0. □ 

The above lemma implies that procedures Pump and RepeatedPumping 
could, in principle, be used to hnd a potential transformation yielding an e- 
ergodic solution. It does not offer, however, a way to discover e-non-ergodicity. 
Towards this end, we need to hnd some sufficient and algorithmically achievable 
conditions for e-non-ergodicity. 

Let us hrst analyze (O-)non-ergodicity of stochastic games (in stationary 
strategies). 

Lemma 4 A stochastic game is non-ergodic if and only if it is e-non-ergodic 
for some positive e. 

Proof A stochastic game is non-ergodic by dehnition if there exists a threshold 
cr, positions v,u G V, and stationary strategies a and /3 for the players, such that 
no matter what other strategy (S' player 2 chooses the Markov chain resulting by 
hxing {a, (S') has a value > a when using initial position vg = v (guaranteeing for 
player I more than a from v), and the Markov chain obtained by hxing {a',(S) 
has a value < a when using initial position vg = u (guaranteeing for player 2 
less than cr from u). Since strategies a' and (S' are chosen from a compact space, 
the above implies that there are a' > a > a" such that a guarantees for player 
1 at least a' from the initial position v, and (S guarantees for player 2 at most a" 
from initial position u. Hence the game is e-non-ergodic for any e < a' — a"SA 
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Lemma 5 A stochastic game F is e-non-ergodic if there exist disjoint non¬ 
empty subsets of the positions I,F C V, reals a,b with b — a > e, stationary 
strategies a", v G I, for player 1, and u G F, for player 2, and a vector of 
potentials x G such that 

(Nl) alplf = 0 for allvGl,u^I,kG K'" and I G , 

(N2) Pfp'ff = 0 for all u G F, w ^ F, £ G and k G and 
(N3) for all V G I and u G F: 

min > b and _ max A^(x)l3'^ < a. 

/3"eA(L«’) a“eA(_ff“) 

Proof Let us note that (Nl) and (N3) imply that for all strategies / 3 ' G £(r) of 
player 2, the pair of strategies {a, ff), where := d" iorv G I and d’ G A(iF") 
is chosen arbitrarily for v ^ I, results in a Markov chain in which subset I 
induces one or more absorbing sets (that is, p^^, = 0), and in which all positions 
have values at least b. Analogously, (N2) and (N3) imply that F will always 
induce an absorbing set with values less than a, if we fix any pair of strategies 
(a',/3), where a' is any strategy in /C(r), /3" := £3^ ioi v G F and /3" € A(L") 
is chosen arbitrarily, for v ^ F. Hence choosing any positions v G I and u G F 
and strategies a and P provides a witness for the e-nonergodicity of F. (Here, 
we use the well-known fact |MO70) that, to each player’s stationary strategy, 
there is a best response of the opponent which is also stationary.) □ 

Let us introduce a notation for denoting upper bounds on the entries of the 
matrices, more precisely on the part of these entries which do not depend on 
negative potential differences. Specifically, define 

<i{^) + E 

bl,{x) = m+(x) - E Pl>u - E pT^^"’ - 

u^V u^V, x'^'>x'^ 

where, as before, m'^{x) := max„ m'"{x), m~{x) := min^, m'"(x). Define further 


R^{x) 

R'^ix) 


max 

max 


{altix)) 

(hUx)) 


if m^(x) > 


m'^ {x)-\-m (a:) 

2 ’ 


otherwise. 


(9) 


Note that 

m'^{x) — fefc^(x) < a'j.i^x) < afe^(x) for all w G H, k G , £ G F" and x G K^, 
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which implies 


rny{x) < Ry{x) 

to’'( a;) > m~^ (x) — R" {x) 


if 

otherwise. 


for all e y and x S 
( 10 ) 


With this notation we can state a more constructive version of Lemma [5] 

Lemma 6 A stochastic game L satisfying (0) is e-non-ergodic if there exist 
disjoint non-empty subsets I,F CV, a vector of potentials x G and reals 
a',b' G [0,TO+(a:)] with b' — a' > 3e, a' < ™ ix)-em {x) ^ ^ m (x)+m 

that 


(N4) to" (a;) > b' for all v G I, and TO“(a;) < a' for all u G F; 
(N5) a:" — a;" > |L"|Wi?"(a;)^/e for all u ^ I, and v G I; 

(N6) a;“ — a:" > |iL"|Wi?"(a:)^/e for all u G F, and v ^ F. 


Proof We first show that (N4)-(N5) imply the existence of strategies a", for 
V G I, satisfying (Nl) and (N3). We shall then observe that a similar argument 
can be applied to (N4) and (N6) to show the existence of strategies /3“, for 
u G F, such that those satisfy (N2) and (N3). Consequently, our claim will 
follow by Lemma [5j 

Let us now fix a position v G I and denote respectively by d" and /3" the 
optimal strategies of players with respect to the payoff matrix A^ix). Denote 
further by /3" = 1,..., 1) the uniform strategy for player 2, and set 

K'’ = {kGK-\Y.uili:,^L^PM=^}- 

Let us then note that we have 




< 


K’ix) if/ceif", 

R'’{x) — ^ otherwise, 


since at least one of the entries of (N5) has at least as a coefficient in rows 
which are not in K'’. 

Note that b' > 0 implies by m that R"{x) > 0. Thus by the optimality of 
a and by the above inequalities we have 


0 < &' < to" (a;) < d"A"(a:)^" < R^x) 



R'^jxf 

e 


implying that < R'^\x) • by (N4) we have 0 < o', inequalities 

e < a' + 3e < 6' < to" (a;) < i?"(a:) follow, and hence < 1 must hold, 

implying that the set X" is not empty. Let us then denote by 5" the truncated 
strategy defined by 


XfeGK” '^k 
0 if 


k G X", 
k ^ if". 
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With this we have for any f]'" G A(L") 
b' < rrf{x) < 


< 



< 


(a^A^{x)l3^^ +e. 


Let us then dehne = a" and repeat the same for all v € I. Then, these 
strategies satisfy (Nl) and (N3) with b = b' — e. 

Let us next note that by adding a constant to a matrix game it changes its 
value with exactly the same constant. Furthermore, multiplying all entries by 
— 1 and transposing it, changes its value by a factor of —1, interchanges the 
roles of row and column players, but leaves otherwise optimal strategies still 
optimal. Thus, we can repeat the above arguments for the matrices B'^{x) = 
mA(x)E'^ — A'^{xY', where E is the |L“| x |iL“|-matrix of all ones, and obtain 
the same way strategies 13^, u € F satisfying (N2) and (N3) with a = a' + e. 
This completes the proof of the lemma. □ 

To create a finite algorithm to find sets I and F and potentials satisfying 
(N4)-(N6) we need to do some modifications in our procedures. 

First, we allow a more flexible partitioning of the m-range by allowing the 
TO-range boundaries to be passed as parameters and replacing hne[2]in procedure 
Pump by 

2: Set (5 := (m+ — m“)/4. 

Next, Let us replace in procedure Pump, line [7] by the following lines, where 
e > 0 is a prespecified parameter, and call the new procedure with these modi¬ 
fications MODiFiEDPuMp(e, X, S, m_, m+): 

7a: Otherwise set := {v € S \ m^{xr) > m~ + 26} and compute 


{alf{Xr)) if V e Pr, 


max 

kGK'^,e&L 



if V ^ P^ 


max 

k^K'‘,e&L‘ 


where a and b are defined by (HJ. 

7b: Create an auxiliary directed graph G = {V, E) on vertex set V such that 


(v, u) G E iff 




V ^ P-, 


T • 
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7c: Find subsets Ir and Fr of V such that Tr C 1^. C P^., Br Q Fr Q V \ Pr, 
and no arcs are leaving these sets in G (this can be done by a finding 
the strong components of G, or by the method described int he proof of 
Theorem [IJ . 

7d: if such sets are found STOP and output these sets, otherwise continue 
with steplHl 

Before starting to analyze this modified pumping algorithm, let us observe 
that we have for all iterations 

f Tfi -I- 7T7 

m~ < m~ + 2 -2- “ ‘m'"{xT) < 7?" for all v G Pr (11) 

as long as m’*' — m~ > e. 

Lemma 7 Procedure MODiFiEDPuMP(e, cc. S') terminates in a finite number of 
steps. 

Proof Let us observe that by Lemma[2] procedure Pump would either terminate 
with Tr = Br = % for some finite r > tq, or there exist sets I = It and 
F = Ft satisfying conditions (b) and (c) of the lemma, for A = NWQ'^ fe, where 
N = max{max{|7F”|, \L^\} : v G I U F}, and Q = max{i?)l(^j : v G I Li F}. 
Thus, in the latter case, ModifiedPump will indeed find some sets It and Ft, 
and hence terminate for some finite r. □ 


Lemma 8 Procedure MODiFiEDPuMP(e, x, fo) either shrinks the m-range by a 
faetor of 3/4 or outputs potentials x = Xt and sets I = It and F = Ft which 
satisfy conditions (N4)-(N6) with a' <b'. 

Proof When the procedure terminates without shrinking the m-range, then it 
outputs sets I = It and F = Ft such that in the auxiliary graph G there are no 
arcs leaving these sets. Since I G Pt and F C V\Pt, condition (N4) holds with 
a' = max„^p^ m'’{xT) < b' = (m^ + m~)j2. Furthermore, the lack of leaving 
arcs in G implies that for all (v,u), v G I and u ^ I and also for all (u, v) with 
u G F and v ^ F we must have the reverse inequalities in (7b), implying that 
conditions (N5) and (N6) hold. □ 

Let us observe that the bounds and strategies obtained by Lemmas 0 and [5] 
do not necessarily imply the e-non-ergodicity of the game since those positions 
in It and Ft may not have enough separation in m-values (i.e. the condition 
b'—a' > 3e in Lemma[6]is not satisfied). To fix this we need to make one more use 
of the pumping algorithm, as described in the ModifiedRepeatedPumping 
procedure below. After each range-shrinking in this algorithm, we use a rou¬ 
tine called REDUCEPOTENTiAL(r, X, m_, m+) which takes the current potential 
vector X and range [m_, m+] and produces another potential vector y such that 


13 



Ilz/lloo < We need to this because, as the algorithm proceeds, the po¬ 

tentials, and hence the transformed rewards, might grow doubly-exponentially 
high. 

The potential reduction can be done as follows. We write the following 
quadratic program in the variables x € K'^, a = | v G V)] G /C(r), and 

P = {^-\vGV)]G C{T): 

a"A^{x') >m-■ e, {x')j3'“ < m+■ e, (12) 

a"e = 1, e/3’^ = 1, 

a" >0, I3^> 0, 

for all V G V, where e denotes the vector of all ones of appropriate dimension. 
This is a quadratic system of at most QN (in)equalities on at most {2N -\- l)n 
variables. Moreover the system is feasible since the original potential vector x 
satisfies it. Thus, a rational approximation to the solution to within an additive 
accuracy of 5 can be computed,using quantifier elimination algorithms, in time 
poly(7;,iV'^(’"^),logi); see |BPR961 KlVSSl IRen92| . Note that the resulting 
solution will satisfy m but within the approximate range [m_ — J, -f 5]. 
By choosing 5 sufficiently smaller than the desired accuracy e, we can ignore the 
effect of such approximation. 

Lemma 9 MODiFiEDREPEATEDPuMPiNG(e) terminates in a finite number h < 
log ^ / log I, of iterations, and either provides a potential transformation prov¬ 
ing that the game is 24,e-ergodic, or outputs two nonempty subsets I and F and 
strategies a" , v G I, for player 1 and (5'", v G F, for player 2 such that conditions 
(N4), (N5) and (N6) hold with b',a' satisfying the condition in Lemma\^ 

Proof Let us note that if T,- = 0 after the second ModifiedPump call, then 
the range of the m-values has shrunk by a factor of | (at least), while if this 
happens in the first stage the m-range has shrunk by a factor of 3/4. 

On the other hand if the m-range is not shrinking, and we have Br = 0 
after the second call of ModifiedPump, then we would also have m"{xr) > 
|m+ -I- |m“ = b' for all v G I, while m^{xT) < (to+ -|-m“)/2 = a' for all u G F, 
and hence (N4)-(N6) hold with these o' and b' values. Since the m-range has not 
shrunk, we must have — m~ > 24e, and hence b' — a' = ^(m+ — m+) > 3e 
follows. (Note that, since in the second stage we pump only positions in the 
potentials of these positions may go down, while those of the positions outside 
Xr remain unchanged, and hence condition (N5) remains satisfied.) 

Finally, if the m-range is not shrinking, and the second call returns a new 
set Ir, then all m-values of this set are at least |m“'' -I- jm~ > |m“'' -I- |m“ = b', 
and with the same set F we can conclude again that conditions (N4)-(N6) hold. 
□ 


To complete the proof of Theorem [l] we need to analyze the time complexity 
of the above procedure, in particular, bounding the number of pumping steps 
performed in ModifiedPump. 
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Algorithm 3 MODIFIEDREPEATEDPUMPING(e) 

1 : Initialize h := 0, and Xh '■= 0 G 

2: Set m^{h) := maxy^y m^(xh) and m~{h) := miiiy^v m^{xh)- 
3: if m'^{h) — m~{h) < 24e then 
4: return Xh- 

5: end if 

6: Xh+i :=MODiFiEDPuMp(e, Xft,, y, m_, m+) and let Fr,Ir,Ty,Br,Pr be the 
sets obtained from ModifiedPump. 

7: if T,- = 0 or S,- = 0 then 

8: Xh+1 : = REDVCEPOTENTlAL(r,Xr,m-{h),m+{h)) 

9: Set h := h + 1 and Goto stepO 

10: end if 

11 : Otherwise set F = Fr and I = ly. 

12 : Xh+i :=MODiFiEDPuMP(e,w+) and let Ty,Br be the sets ob¬ 
tained from this call of ModifiedPump. 

13: if Tt = 0 then 

14: Xh +1 : = REDUCEPOTENTIAL(r,Xr,TO-(/l),TO+(/l)) 

15: Set h := h + 1 and Goto step [21 

16: end if 

17: ii By — ^ then 

18 : Goto step 121] 

19: end if 

20: Otherwise, update / := ly. 

21: return Xh+i and the sets I and F. 
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Let us note that as long as m'*" — m~ > 24e we pump the upper half Pr 
by exactly 6 > 6e. Let Pr{v) (resp., Afriv)) denote the number of iterations, 
among the first t, in which position v was pumped, that is, v G Pt (resp., not 
pumped, that is, v ^ Pt). 

Let us next sort the positions v & V such that we have 

< • • • < cc"", 

and write — x'i? for j = 1,2, ...,n — 1 . Note that Vrivi) = t and 

MriVn) = T. 

Let ir be the largest index in {1, 2, ..., n}, such that G Pr- Then, by ([5]) 
we have for i = 0,1, 2,... , v — 1 that 


0 < (xt) < R + Aj, (13) 

1=1 

where the sum over the empty sum is zero by definition. Similarly, for i = 
ir + 1,..., u, we have 


- R< h'"f!^{Xr) < i? + ^ ^n-j- 
1=1 


From (I13|) and da, it follows that 

< <! 


for 1 = 0,1, 2,..., V — 1 


R + An-j, for 1 = V, V + 1, ■ • ■ ,11 - 1. 


Let It be the smallest index i such that 


nw{r+y:-=\^3? 

A, >-^- 


and let i-r be the largest index i < n — 1 such that 

NWiR + YTiZr" 


A, > 


(14) 


(15) 


(16) 


(17) 


From the definition of ir, we know that 


A, < 


NW{R + ET=\^j)^ 

e 


for all! = 1,..., It- ~ 1- 


Solving this recurrence, we get 


V-r 

Xr^ —X 


vi 

T 


■L-r — i- 



l)lVlFi?Y 


/nNWR 
itr-lfR< ( -^- 


2^-1 

n^R. 

(18) 
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Similarly, the definition of ir gives 


A, < 


e 


for all z = v + 1,..., n — 1, 


from which follows 


X 


Vn 

T 




< 


^nNWR'^ 



(19) 


Note that if v < i-r then (ESI implies that taking It = {i;i ,,vj } would 
satisfy condition (N5) and guarantee that It Q Pt- 
Indeed, for all i < iT and u ^ It, we have 


>A~ > 


AM^(i? + E-=iA,)2 


> 


\L'’^\W{R^^{xT)f 


Similarly, having z,- > v guarantees that taking Ft = {uj _|_i, ■ • ■, u„} would 
satisfy (N6) and Ft D Pt = 0. 

, since for alH > v + 1 and u ^ Ft, we have 

. A . Arkk(i? + E;Ar'A„_,)2 ^ \K-^\w {R^H^r)f 

Xj. X^ ^ ZA • j> ^ 

T T It g g 

On the other hand, if v > v + 1, then El implies that was al¬ 
ways pumped except for at most k{R) := ^ iterations, that is, 

A/’r('Ci^+i) < k.{R). Also, since 0 Pt, then at time r, Ui^+i is not pumped. 
Similarly, if i,- < v, then El implies that was never pumped except for 

at most k{R) iterations, that is, ^ while it is pumped at time r. 

Since we have at most n candidates for each of and it follows that after 

T = 2nK{R) + 1, neither of these events (it > + 1 and i,- < It) can happen, 

which by our earlier observations implies that the algorithm constructs the sets 
It and Ft. We can conclude that MODiFiEDPuMP(e, cc, F) must terminate in 
at most 2nK,{R) + 1 iterations, either producing mF — m~ < 24e or outputting 
the subsets It and Ft proving e-non-ergodicity. 

One can similarly bound the running time for the second call of Modified- 
PuMP (line [12), and the running time for each iteration of ModifiedRepeat- 
EDPuMPiNG(e) (but with R replaced by 

It remains now to bound the running time for the second call of Modified- 
PuMP (line [12), and the running time for each iteration of ModifiedRepeat- 
EDPuMPiNG(e). We can repeat essentially the same analysis as above, assuming 
that we modify the rewards with the potential vector obtained up to this point in 
time. Since, by the above argument, the maximum potential difference between 
any vertices before at the time r, when we make the second call to Modified- 
PuMP is at most 6{2nK{R) -b I), it follows that the maximum absolute value 
of the transformed rewards at time r is ^•^“(xt) < i ?2 '.= R + 6{2nK{R) + I) 
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(note that the non-negativity of the rewards was only needed to bound to_ > 0 
initially). It follows by the same argument as above that the second call Mod- 

ifiedPump terminates in time 2 n/c(i? 2 ) + 1 = ^ 

After shrinking the m-range, we apply potential reductions which guarantees 
that the bit length of each entry in potential vector is bounded by a polynomial 
in the original bit length 77 . It follows that the new transformed rewards will have 
absolute value bounded by R 3 = We repeat the same argument 

for the different phases of MODiFiEDREPEATEDPuMPiNG(e) to arrive at the 
running time claimed in Theorem [T] 

This completes the proof of the theorem. □ 
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